Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Rework ExecuteReaderAsync to minimize allocations #528

Merged
merged 5 commits into from
Jul 21, 2020

Conversation

Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented Apr 16, 2020

Benchmarking against the usual DataAccessPerformance project I've got all the way down to the once per call allocations now. ExecuteDataReader uses callbacks delegates and closures which can be cached eliminated or minimized in many cases.

The AAsyncCallContext class is very similar to the one already used internally in SqlDataReader because the pattern of allocation and caching is similar. Once this PR is reviewed and merged I intend to go back and rebase the SqlDataReader internal classes on this new base class.

The changes here come in a number of patterns:

  1. replace an instance bound delegate which must be allocated on each call with a static cached caller which has a new first parameter of the this type and bounces the call through a translation to an instance call again.
  2. move continuations into state functions and pass state as parameters wherever possible to allow caching and avoid closures.
  3. replace a compiler generated closures (which are always in method scope) with an explicitly and limited scope state object.

I also changed a void returning function which was altering a ref parameter to one which returns the new value because it's easier to understand what it's doing using the more familiar pattern.

some of the allocations eliminated are:
Capture2

there are some more off the bottom of the screenshot.

The same work can be done for ExecuteScalar eventually.

Copy link
Member

@DavoudEshtehari DavoudEshtehari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask you to describe how do you get this report and what do you mean of the DataAccessPerformance project?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Jun 24, 2020

The DataAccessPerformance project emulates the TechEmpower fortunes benchmark and I was pointed to it as a good performance based benchmarking solution. If you look at my history of pull requests here and in corefx you'll see I use it a lot and have been gradually improving performance and memory usage using it.

The screenshot is from dotMemory which is a memory profiler. I highlighted the items that I've affected in red using paint


private Task<DbConnectionInternal> CreateReplaceConnectionContinuation(Task<DbConnectionInternal> task, DbConnection owningConnection, TaskCompletionSource<DbConnectionInternal> retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionPoolGroup poolGroup, CancellationTokenSource cancellationTokenSource)
{
return task.ContinueWith(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment @Wraith2: ContinueWith generally performs much worse than a simple async method with await (not to mention that's it's more brittle wrt exception handling etc.). I can post some benchmarking if you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this may be totally justified if TaskContinuationOptions.LongRunning is important here - just making the general remark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language supported async await would be easier to understand as well.
Whoever wrote the original code had some reason to use imperative constructs instead of language support but I've no idea what it was.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language supported async await would be easier to understand as well.

Absolutely.

Whoever wrote the original code had some reason to use imperative constructs instead of language support but I've no idea what it was.

I suspect a lot of the code was simply written before async/await was introduced (in .NET 4.5, relatively "late"). I'd definitely consider replacing the callback approach as you work through the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me but it's down to the MS team to make that sort of decision. I don't want to make working on this library any harder than it already is so I try to replicate the existing style wherever it's sensible to do so in my changes.

There are several library wide things that would be nice to do.

  1. Merge the netfx and netcore codebases into a single solution with documented reasons for any differences
  2. Change how async methods are implemented with snapshots to reduce performance issues.
  3. Change from imperative async to language async.

We've got 1 in progress but the others really need 1 to be done and judged stable so we don't have to duplicate a lot of work. 1 will aoso bring perf improvements done in netcore to netfx.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for prioritizing 1 - it is currently a quite confusing codebase, which makes community contributions harder than the should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on it bit by bit. #625 The easy part is the identical files. The complex part is the files which have opposing or complex changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, thing kind of cleanup can be done separately of course. I'd suggest that if ContinueWith continuations are being rewritten for unrelated reasons, it may be a good opportunity to just do that with async/await, but of course that's up to the team and you.

I also totally agree that merging the netfx/netcore codebases is a high-priority task.

@@ -1864,14 +1873,22 @@ private static void ChangePassword(string connectionString, SqlConnectionString
// this only happens once per connection
// SxS: using named file mapping APIs

internal void RegisterForConnectionCloseNotification<T>(ref Task<T> outerTask, object value, int tag)
internal Task<T> RegisterForConnectionCloseNotification<T>(Task<T> outerTask, object value, int tag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a candidate for rewriting with simple async/await

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@David-Engel @cheenamalhotra I can give this a try in the PR if you like or we can leave it for a later date, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there more similar cases that we would like to address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code tooks a ref Task and added a continuation to it then returned the new wrapper task in the same variable. It doesn't do any awaiting and if the caller was rewritten to use language async then this function would be totally rewritten or removed as well. It'd prefer not to touch it further in this PR and work on it at a stage when a larger scale transition from contiunation to langauge async is being done.

@cheenamalhotra cheenamalhotra added the 📈 Performance Use this label for performance improvement activities label Jun 25, 2020
@DavoudEshtehari
Copy link
Member

Can I ask you to produce some information to illustrate how do these changes affect the performance?

@johnnypham johnnypham self-assigned this Jul 8, 2020
@Wraith2
Copy link
Contributor Author

Wraith2 commented Jul 11, 2020

Using Northwind and the benchmark code:

		[Benchmark]
		public async Task<int> AsyncReadOrders()
		{
			int max = 0;

			using (SqlDataReader reader = await command.ExecuteReaderAsync())
			{
				while (await reader.ReadAsync())
				{
					int value = reader.GetInt32(0);
					if (value > max)
					{
						max = value;
					}
				}
			}
			
			return max;
		}
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
AsyncReadOrders branch 353.9 μs 6.93 μs 9.71 μs - - - 2.15 KB
AsyncReadOrders master 349.1 μs 6.92 μs 8.24 μs - - - 2.59 KB

It's a really tiny benchmark as you can see from the times. It simply gets some ints using a data reader and reads them. The main push is the allocated memory. Very little of those allocations are user data they're task machinery and library internals. The reduction you see is delegates which are removed by caching static versions of them and some async closures which are removed by using direct state objects to avoid capture.

@johnnypham
Copy link
Contributor

I was able to verify the difference in allocations but am I correct in thinking that the benefits would mostly be lost when returning larger result sets for each call to ExecuteReaderAsync?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Jul 13, 2020

The benefits wouldn't be lost they'd just be less of the total memory allocated if you do a lot of reading. If people are making an effort to do sensible things, chunky not chatty, then this is just a nice to have, if they're being bad at sql and doing chatty calls then this is nicer to have. It's never fundamental.

In an ideal situation I'd like the library to only allocate memory for the results that a user requested, that isn't practical but if there is benefit to users for relatively small increase in complexity of the library it's possibly worth taking. The reason I started working on this library is that I had written some code using the new pipelines and span apis in netcore and had a really lean fast project and then when I started trying to get that data in and out of a database using this library obliterated all the careful performance work I'd done. So yes these are smaller gains but they're still of value if you want to support high performance scenarios.

@cheenamalhotra cheenamalhotra added this to the 2.1.0-preview1 milestone Jul 16, 2020
@cheenamalhotra
Copy link
Member

@Wraith2 seems there are conflicts with recent merge of PR #499 - please resolve when you have time.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Jul 16, 2020

merged and fixed up.

@cheenamalhotra cheenamalhotra merged commit 7b82e07 into dotnet:master Jul 21, 2020
@Wraith2 Wraith2 deleted the perf-registerclosestate branch August 7, 2020 21:30
panoskj added a commit to panoskj/SqlClient that referenced this pull request Aug 26, 2022
panoskj added a commit to panoskj/SqlClient that referenced this pull request Nov 21, 2022
panoskj added a commit to panoskj/SqlClient that referenced this pull request Nov 21, 2022
panoskj added a commit to panoskj/SqlClient that referenced this pull request Nov 21, 2022
panoskj added a commit to panoskj/SqlClient that referenced this pull request Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📈 Performance Use this label for performance improvement activities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants